Add mz_cluster_deployment_lineage index+view#30706
Conversation
src/catalog/src/builtin.rs
Outdated
| * process, but someone turning on a new use case that happens to have the same | ||
| * name as a previous but logically distinct use case. | ||
| */ | ||
| pub static MZ_CLUSTER_BLUE_GREEN_LINEAGE: LazyLock<BuiltinView> = LazyLock::new(|| BuiltinView { |
There was a problem hiding this comment.
@benesch I figured you'd be the best person to review it. Pretty much the same from your original query here https://github.com/MaterializeInc/console/issues/3345#issue-2589359640 except for a few changes:
- Instead of a search, just does it for all clusters in
mz_clusters - Maps each id to the current deployment. Similar format to
mz_object_transitive_dependencies
There was a problem hiding this comment.
I only skimmed, but if you only made the changes you described, LGTM!
| ## `mz_cluster_blue_green_lineage` | ||
|
|
||
| The `mz_cluster_blue_green_lineage` table shows the blue/green deployment lineage of all clusters in [`mz_clusters`](../mz_catalog/#mz_clusters). It determines all cluster IDs that are logically the same cluster. |
There was a problem hiding this comment.
I'm finding this name/description to be a little confusing. Does it only contain data on clusters that actively undergoing a blue/green deployment, or does it contain historical data about clusters that have gone through blue/green deployments in the past?
There was a problem hiding this comment.
Does it only contain data on clusters that actively undergoing a blue/green deployment, or does it contain historical data about clusters that have gone through blue/green deployments in the past?
Ah it contains historical data about clusters that have gone through blue/green deployments in the past, as well as current clusters! Brings a good point that I should probably make a testdrive/SLT test for this. Converted to draft to make a test
There was a problem hiding this comment.
Created a test in test/sqllogictest/cluster_blue_green_lineage.slt. Hopefully it better describes what I'm aiming for! I wonder if it makes sense to change the field description of cluster_id to "The ID of a cluster, regardless of whether it has been dropped or not. Corresponds to mz_clusters.id." and cluster_name to current_deployment_cluster_name? Then maybe explicitly call out that current_deployment_cluster_id must exist.
| /** | ||
| * Traces the blue/green deployment lineage in the audit log to determine all cluster | ||
| * IDs that are logically the same cluster. | ||
| * cluster_id: The ID of a cluster. | ||
| * current_deployment_cluster_id: The cluster ID of the last cluster in | ||
| * cluster_id's blue/green lineage. | ||
| * cluster_name: The name of the cluster. | ||
| * The approach taken is as follows. First, find all extant clusters and add them | ||
| * to the result set. Per cluster, we do the following: | ||
| * 1. Find the most recent create or rename event. This moment represents when the | ||
| * cluster took on its final logical identity. | ||
| * 2. Look for a cluster that had the same name (or the same name with `_dbt_deploy` | ||
| * appended) that was dropped within one minute of that moment. That cluster is | ||
| * almost certainly the logical predecessor of the current cluster. Add the cluster | ||
| * to the result set. | ||
| * 3. Repeat the procedure until a cluster with no logical predecessor is discovered. | ||
| * Limiting the search for a dropped cluster to a window of one minute is a heuristic, | ||
| * but one that's likely to be pretty good one. If a name is reused after more | ||
| * than one minute, that's a good sign that it wasn't an automatic blue/green | ||
| * process, but someone turning on a new use case that happens to have the same | ||
| * name as a previous but logically distinct use case. | ||
| */ |
There was a problem hiding this comment.
We don't need to block this PR, but maybe we should add explicit information to audit events that relate to blue-green deployments that would allow us to more easily and correctly find this information.
There was a problem hiding this comment.
Potentially! Though the audit log migration from my experience might be brutal itself 😬 . There's been talk about this however https://materializeinc.slack.com/archives/C06GZ7GBKB5/p1731957134123909 but adding the WMR view seemed like the easiest thing to do for now!
| | `last_status_change_at` | [`timestamp with time zone`] | Wall-clock timestamp of the connection status change.| | ||
| | `status` | [`text`] | | The status of the connection: one of `pending-service-discovery`, `creating-endpoint`, `recreating-endpoint`, `updating-endpoint`, `available`, `deleted`, `deleting`, `expired`, `failed`, `pending`, `pending-acceptance`, `rejected`, or `unknown`. | | ||
|
|
||
| ## `mz_cluster_blue_green_lineage` |
There was a problem hiding this comment.
Can we call this "deployment" instead of "blue/green"? I think that's a bit cleaner/clearer!
| ## `mz_cluster_blue_green_lineage` | |
| ## `mz_cluster_deployment_lineage` |
| <!-- RELATION_SPEC mz_internal.mz_cluster_blue_green_lineage --> | ||
| | Field | Type | Meaning | | ||
| |-------------------------------------|--------------|----------------------------------------------------------------| | ||
| | `cluster_id` | [`text`] | The ID of the cluster. Corresponds to [`mz_clusters.id`](../mz_catalog/#mz_clusters). | |
There was a problem hiding this comment.
| | `cluster_id` | [`text`] | The ID of the cluster. Corresponds to [`mz_clusters.id`](../mz_catalog/#mz_clusters). | | |
| | `cluster_id` | [`text`] | The ID of the cluster. Corresponds to [`mz_clusters.id`](../mz_catalog/#mz_clusters) (though the cluster may no longer exist). | |
src/catalog/src/builtin.rs
Outdated
| * process, but someone turning on a new use case that happens to have the same | ||
| * name as a previous but logically distinct use case. | ||
| */ | ||
| pub static MZ_CLUSTER_BLUE_GREEN_LINEAGE: LazyLock<BuiltinView> = LazyLock::new(|| BuiltinView { |
There was a problem hiding this comment.
I only skimmed, but if you only made the changes you described, LGTM!
We need this view+index to determine the logical identity of clusters between blue green deployments in the Console.
We need this view+index to determine the logical identity of clusters between blue green deployments in the Console.
The followup Console PR is here https://github.com/MaterializeInc/console/pull/3556
Motivation
https://github.com/MaterializeInc/database-issues/issues/10225
Tips for reviewer
Checklist
$T ⇔ Proto$Tmapping (possibly in a backwards-incompatible way), then it is tagged with aT-protolabel.